# Design of Energy-efficient RFET-based Exact and Approximate 4:2 Compressors and Multipliers

Nima Kavand, Armin Darjani, Shubham Rai, Member, IEEE, Akash Kumar, Senior Member, IEEE

Abstract—The ever-increasing demand for low-power and area-efficient circuits for use in battery-powered devices and the CMOS scaling problems have attracted the attention of VLSI designers to beyond-CMOS technologies like Reconfigurable Field-Effect Transistors (RFETs). Improving the efficiency of multipliers is critical as the core component of many applications such as image processing and Machine Learning (ML). This paper proposes a compact and energy-efficient RFET-based architecture for the 4:2 compressor and Dadda multiplier, leveraging transistorlevel reconfigurability and multi-input support of the RFET. Moreover, we propose a novel approximate 4:2 compressor based on efficient RFET logic cells to cater to the needs of error-resilient applications. Extensive circuit-level simulations with 14nm germanium nanowire (GeNW) RFET technology show that the proposed RFET-based exact multiplier improves the power consumption and power-delay product (PDP) by 65% and 45%, respectively, compared to the conventional CMOS-based counterpart in 14nm FinFET technology. Besides, we show that utilizing the proposed approximate compressor, the area and PDP of the multiplier reduce by 46% and 42%. The effectiveness of the approximate multiplier is evaluated in the image multiplication, and the average PSNR and SSIM values are 31.39 and 0.87, respectively.

Index Terms—RFET, GeNW, Compressor, Multiplier, Approximate computing

#### I. INTRODUCTION

In recent years, the wide usage of various microprocessors in embedded systems and battery-powered portable devices pushed VLSI designers to employ different methods to gain more compact and energy-efficient circuits. Among digital arithmetic blocks, multipliers play an important role in many applications, such as digital signal processing (DSP), image processing, and machine learning (ML). Multiplier circuits are large and power-hungry and contribute considerably to overall system performance. A multiplier usually comprises three phases: 1) *Partial Product Generation (PPG):* Partial products are generated by parallel logic AND operators from "multiplicand" and "multiplier." 2) *Partial Product Reduction (PPR):* Partial products are reduced to only two operands for the last phase. 3) *Final addition:* A ripple-carry adder produces the multiplication result.

As the second stage has the largest portion of the area, propagation delay, and power consumption, efficient design of this phase is crucial. Dadda method is one of the fastest and most well-known methods for partial product reduction, and 4:2 compressors are commonly used for implementing it [1].

Several works have proposed CMOS-based efficient 4:2 compressors in terms of delay, power, and area [2]. However, everincreasing CMOS scaling problems and slowing down of Dennard scaling have made it challenging to create more compact and lowpower circuits with each new technology generation. For this reason, several compressors have been proposed based on beyond-CMOS technologies [3], [4]

Among emerging technologies, Reconfigurable Field-Effect Transistor (RFET) follows a top-down manufacturing process similar to CMOS [5], [6], and its unique features, like ambipolarity and

This work was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project Number 439891087—SecuReFET and the German Federal Ministry for Education and Research (BMBF) under the framework of VE-CirroStrato.

Nima Kavand, Armin Darjani, Shubham Rai, and Akash Kumar are with the Department of Computer Science, Technische Universität Dresden, 01062 Dresden, Germany (e-mail: nima.kavand@tu-dresden.de; armin.darjani@tudresden.de; shubham.rai@tu-dresden.de; akash.kumar@ tu-dresden.de). multi-input support, open new doors to designing compact low-power circuits [7]. Several previous works showed the benefits of RFET in creating basic logic gates [6], [8]–[10].

In this paper, we propose an area- and energy-efficient 4:2 compressor and Dadda multiplier exploiting RFET features. Moreover, as error-resilient applications like image processing or ML allow designers to use approximate compressors and multipliers to reduce energy consumption and area [1], [4], [11]–[18], we also propose a novel approximate 4:2 compressor designed based on efficient RFET logic cells. Although most research in the RFET domain has been focused on Silicon nanowire (SiNW) transistors, recently introduced GeNW transistors [19]–[21] can offer much better performance. As the power and performance trade-off is essential in a multiplier, in this paper, we use GeNW RFET [21] to analyze 4:2 compressors and Dadda multiplier. The contributions of this paper are as follows:

- *RFET-based exact 4:2 compressor:* We propose an RFET-based architecture for a compact and low-power exact 4:2 compressor. We demonstrate that only 1-to-1 replacement of CMOS transistors with RFETs doesn't lead to an optimal circuit. Therefore, we have proposed a design that exploits reconfigurability and multi-input support of RFETs.
- *RFET-based approximate 4:2 compressor:* We introduce a novel 4:2 approximate compressor utilizing efficient RFET-based logic cells like minority and multiplexer for error-resilient applications.
- *Exact and approximate Dadda multiplier:* We implement exact and approximate  $8 \times 8$  Dadda multipliers employing our two proposed compressors. We propose a structure for intra- and inter-compressor connections within the exact Dadda multiplier to minimize the number of cascaded transmission gates (TG) in the critical path of the multiplier.
- *HW and accuracy analysis:* We performed SPICE simulations using the 14nm GeNW RFET Verilog-A model [21] to evaluate our proposed compressors and multipliers in terms of area and energy efficiency. Besides, we analyze the accuracy of the approximate multiplier and investigate the quality of this multiplier in an image multiplication application.

# II. BACKGROUND

#### A. 4:2 Compressor

Compressors are logic components that count the number of "1"s in their inputs. The 4:2 compressor is the most commonly used block in the PPR phase of multipliers due to its simple and regular structure [2]. Fig. 1.a shows the general schematic of a 4:2 compressor. It contains two full adders and has four primary inputs  $(X_1, X_2, X_3,$ and  $X_4$ ) and two primary outputs (*Carry* and *Sum*). Besides, input  $C_{in}$  and output  $C_{out}$  are used to propagate the carry bit between cascaded blocks in a compressor chain (Fig. 1.b). Outputs *Carry* and  $C_{out}$  are one binary bit higher in significance than all the inputs and output *Sum*. The boolean functions of *Sum*, *Carry*, and  $C_{in}$ are expressed as:

$$Sum = X_1 \oplus X_2 \oplus X_3 \oplus X_4 \oplus C_{in} \tag{1}$$

$$Carry = (X_1 \oplus X_2 \oplus X_3 \oplus X_4)C_{in} + (X_1 \oplus X_2 \oplus X_3 \oplus X_4)X_4$$
(2)

$$C_{out} = (X_1 \oplus X_2)X_3 + (X_1 \oplus X_2)X_1 \tag{3}$$



Fig. 1. a) General structure of a 4:2 Compressor b) Compressor chain



Fig. 2. a) RFET reconfigurability b) RFET with two, three, and multi gates

#### B. RFET structure and functionality

RFETs are a group of ambipolar transistors that can be electrostatically programmed at run-time to act either as n-mos or p-mos transistors. The behavior of RFET is controlled by two kinds of independent gate terminals, which are called program gates (PGs) and control gates (CGs). The PGs determine the dominant charge carrier, i.e., electron or hole, by modulating the drain and-or source Schottky barriers. However, the CGs control the amount of carrier flow through the channel, just like the traditional CMOS gate. Fig. 2.a shows the reconfigurability of RFET based on different logic values of PG. Most RFETs demonstrate symmetrical I-V characteristics in both program types (p- and n-type) that are utilized for transistor-level reconfigurability [22].

Depending on the number and placement of the PGs and CGs, different RFET designs have been proposed. Example schematics of RFETs with two [23], three [24], and multi gates [25] are illustrated in Fig. 2.b. Our designs in this paper are based on the three-gate RFET. Having more than one independent gate enables us to merge two or more transistors in series into one transistor with a negligible increase in the channel resistance [8], which leads to more compact logic cells. Based on the logic cell design, inputs can be applied to CGs and PGs, knowing that in most RFETs, PG has higher threshold voltage and slower switching [9].

Although converting any CMOS-based circuit to an RFET-based counterpart by 1-to-1 replacement of transistors is possible, we can design circuits with fewer transistors and higher functionality by exploiting the reconfigurability and multi-input support of RFETs [7].

# **III. PROPOSED ARCHITECTURE**

In this section, firstly, we propose a compact low-power exact 4:2 compressor based on efficient RFET-based full-adders, followed by a novel approximate compressor using efficient RFET logic cells for error-resilient applications. Then we propose an architecture for the RFET-based Dadda multiplier. In our designs we utilize reconfigurability and multi-input support of RFETs to gain efficient RFET-based circuits.

#### A. RFET-based Compressors

1) Exact 4:2 Compressor: As shown in Fig. 1, a 4:2 compressor traditionally consists of two connected full adders. In CMOS-based design, full adders are considered as large arithmetic components with significant power consumption. For example, the mirror adder [26], which is one of the most efficient and well-known CMOS-based full adders, requires 28 transistors, while RFET-based full adder presented in [27] and shown in Fig. 3.a is made up of only 14 transistors. Even considering a  $1.5 \times$  [28] area footprint for a three-gate RFET compared to a CMOS transistor to place the extra gate signals, the area of the RFET-based full adder would still be lower



Fig. 3. RFET-based full adder a) Unbalanced version [27] b) Balanced version



Fig. 4. Connections in the proposed exact 4:2 Compressor

than the CMOS-based full adder. This area reduction comes from the transistor-level reconfigurability of RFET, which enables us to design many logic cells like XOR and majority gates more efficiently.

To implement an efficient RFET-based exact compressor, it is crucial to consider both the design of the full adders and the connections inside the compressor. In designing the full adder, the differences between inputs connected to PG and CG pins should be considered, as PGs usually have a higher delay [9]. In the full adder, which is shown in Fig. 3.a, input A has the worst propagation delay compared to input B and C because it is connected to the 16 PGs. In order to prevent a high-delay critical path in the compressor (and hence the final multiplier design), the difference between the delay of the inputs should be reduced. Thus, we propose a balanced version of the RFET-based full adder, shown in Fig. 3.b, for implementing our exact compressor. In the balanced version, by changing the order of input connections the delay of input A is decreased and becomes close to the delay of input B.

Six combinations are possible for connecting  $C_{in}$ ,  $X_4$ , and the output of the first full adder to the input pins of the second full adder (A, B, and C) in the compressor. We chose the structure illustrated in Fig. 4, because in this design, signals pass through only one TG inside the compressor for five "input-output" pairs  $(X_1, C_{out})$ ,  $(X_3, C_{out})$ ,  $(X_4, Carry)$ ,  $(C_{in}, Carry)$ , and  $(X_4, Sum)$ . Additionally, this design makes it possible to connect the compressors on the two levels of the PPR phase in the multiplier in a way that results in the least number of cascaded TGs. This is further explained in Section III-B. Note that pins A, B, and C in Fig. 4 indicate the full adder pins with the same name in Fig. 3.b.

2) Approximate 4:2 Compressor: To design a more compact and energy-efficient compressor for error-resilient applications, we investigate RFET benefits in designing approximate compressors. Our approximate compressor is designed based on minority (Min) and multiplexer-inverter (MUX-INV) cells, which have efficient RFETbased implementations. The structures of these logic cells are presented in [8]. The proposed approximate 4:2 compressor is shown in Fig. 5. This compressor is only made up of 15 transistors compared to the RFET-based exact compressor, which requires 28 transistors. In this design, similar to [1], [4], [11]–[13] input  $C_{in}$  and output  $C_{out}$ are ignored as the first simplification step. This assumption is acceptable because only the input combination " $X_1X_2X_3X_4 = 1111$ " among 16 possibilities results in Cout = 1. The logic function of our approximate compressor can be defined as:

$$Sum = (X_1 X_2 + X_1 \overline{X_3} + X_2 \overline{X_3}) \overline{X_4} + X_3 X_4$$
(4)

$$Carry = X_3 \overline{X_4} + X_4 \tag{5}$$



The truth table of the proposed approximate 4:2 compressor and error distance (ED) corresponding to each output are given in Table I. The error rate is 37.5% and the maximum value of ED is  $\pm 1$ . As the number of "+1"s and "-1"s are equal in the EDs, they usually can neutralize each other in the approximate multiplier structure.

## B. RFET-based Multipliers

1) Exact Dadda Multiplier: In this section, a design for an exact  $8 \times 8$  RFET-based Dadda multiplier is presented. In our design, we aim to minimize the number of cascaded TGs to avoid a high-resistive critical path and degraded signals without adding any extra buffers. According to [27], in RFET-based designs, buffers should be employed every five stages of TGs to restore signal driveability. Thus, we want to keep the number of cascaded TGs under five. For this aim, we select the proper design for the required components and connect them properly in the PPR and final addition phases of the multiplier.

The main component of the PPR phase is the 4:2 compressor. As mentioned, utilizing the proposed compressor structure shown in Fig. 4, each signal passes through one or zero TG inside the compressor. Besides, utilizing this design, we can minimize the cascaded TGs in the two levels of the PPR phase by defining a priority rule for connecting the incoming signals from level one to the input pins of compressors in level two of the PPR phase. If we consider the PRI(X) as the priority of X, the priority orders of the incoming signals and input pins are defined as:

$$PRI(Carry) > PRI(Sum) > PRI(PP)$$
 (6)

$$PRI(X_2) = PRI(X_4) > PRI(X_3) > PRI(X_1)$$

$$\tag{7}$$

These relations mean the incoming signal with higher priority should be connected to the first available input pin with the highest priority. This guarantees that a signal passes through a maximum of three compressors (two compressors in level one and one compressor in level two), hence three cascaded TGs in the PPR phase. The architecture of the proposed multiplier with the connections based on our priority rules is depicted in Fig. 6.

The final addition phase is carried out by a Ripple Carry Adder (RCA). In [27], authors used input pin C to propagate the Carry signal between full adders in an RCA. However, it results in cascaded TGs in the way of the carry signal, which is in the critical path. To resolve this problem without adding extra buffers, we use input pin B instead of C for the carry propagation. In this case, as the critical path passes through B input pins, we create the RCA using the unbalanced full adder shown in Fig. 3.a. In the unbalanced version, input B has a lower propagation delay compared to the balanced version.

After applying these optimizations in the PPR and final addition phases, a signal faces at most four cascaded TGs, three in the PPR and one in the final addition. Hence the need for adding buffers is



Fig. 6. Architecture of the proposed  $8 \times 8$  Dadda multiplier

eliminated. In the following, we summarize our steps in designing the RFET-based exact multiplier:

- (a) In PPG: We use RFET-based AND operators for generating PPs.
- (b) In PPR: We select the 4:2 compressor shown in Fig. 4 for this phase and connect them based on our priority rule.
- (c) In final addition: We employ unbalanced full adders to create the RCA and consider input *B* for propagating Carry signals.

2) Approximate Dadda Multiplier: For designing the approximate multiplier, we replace all the exact 4:2 compressors in the PPR phase with the approximate compressor proposed in Section III-A2. Here, despite the exact multiplier, we do not have the problem of cascaded TGs in the PPR phase because the MUX-INV used in the approximate compressor is implemented by complementary logic, which is driven directly by  $V_{DD}$  and GND. PPG and final addition phases are similar to the exact multiplier.

#### **IV. EXPERIMENTAL RESULTS**

This section presents the simulation result of the proposed exact and approximate compressors and multipliers. We have done circuitlevel SPICE simulations of the circuits using Cadence Spectre to analyze the power and performance metrics of the hardware. To have a fair comparison, 14nm LSTP FinFET technology [29], [30] and 14nm GeNW RFET model [21] are used for simulating CMOS-based and RFET-based circuits, respectively. We have chosen the GeNW RFET model among other RFET models to gain better performance and power trade-off. We consider supply voltage 0.8 V and frequency 2 GHz.

#### A. Compressors

To evaluate the efficiency of our exact compressor, we compared it with the conventional CMOS implementation of the exact compressor, which contains two mirror adders [26]. A mirror adder's symmetrical pull-up and pull-down networks result in equal rise and fall transition times and low power consumption. To emphasize the role of our design, we also compared the proposed exact compressor with a naive RFET-based compressor whose structure is just the same as the conventional CMOS implementation with 1-to-1 replacements of CMOS transistors with RFETs. To show the benefits of our RFETbased approximate compressor, we compare it with the RFET-based exact compressor. Besides, we provide a comparison between the proposed RFET-based approximate compressor and a compressor with the same architecture implemented using CMOS.

The simulation results of the exact and approximate compressors are given in Table II. We analyzed the circuits in terms of delay,

| HW ANALYSIS OF 4:2 COMPRESSORS                                                        |                                            |            |            |          |             |            |  |  |
|---------------------------------------------------------------------------------------|--------------------------------------------|------------|------------|----------|-------------|------------|--|--|
|                                                                                       | Compressor                                 | Delay (ps) | Power (nW) | PDP (aJ) | EDP (ps.aJ) | Area (UST) |  |  |
| Exact                                                                                 | Conventional (implemented with CMOS LP)    | 36.1       | 321        | 11.6     | 418.8       | 56         |  |  |
|                                                                                       | Conventional (implemented with GeNW RFET)* | 59.9       | 135        | 8.1      | 485.2       | 84         |  |  |
|                                                                                       | Proposed (implemented with GeNW RFET)      | 36.4       | 91         | 3.3      | 120.1       | 42         |  |  |
| Approximate                                                                           | Proposed (implemented with CMOS LP)        | 17.9       | 279        | 5.0      | 89.5        | 30         |  |  |
|                                                                                       | Proposed (implemented with GeNW RFET)      | 22.5       | 84         | 1.9      | 42.7        | 22.5       |  |  |
| *Naive RFET-based implementation by 1-to-1 replacement of CMOS transistors with RFETs |                                            |            |            |          |             |            |  |  |
| TABLE III                                                                             |                                            |            |            |          |             |            |  |  |
| HW ANALYSIS OF $8 \times 8$ Dadda multipliers                                         |                                            |            |            |          |             |            |  |  |
|                                                                                       | Multiplier                                 | Delay (ps) | Power (uW) | PDP (fJ) | EDP (ps.fJ) | Area (UST) |  |  |
| Exact                                                                                 | Conventional (implemented with CMOS LP)    | 114.4      | 16.8       | 1.92     | 219.6       | 1950       |  |  |
|                                                                                       | Conventional (implemented with GeNW RFET)* | 195.6      | 6.3        | 1.23     | 240.6       | 2925       |  |  |
|                                                                                       | Proposed (implemented with GeNW REFT)      | 183.9      | 5.8        | 1.06     | 194 9       | 1672 5     |  |  |

83.3

94.9

12.6

57

| Delay (no)              | Dowor   |
|-------------------------|---------|
| HW ANALYSIS OF 4:2 COMP | RESSORS |
| TABLE II                |         |

\*Naive RFET-based implementation by 1-to-1 replacement of CMOS transistors with RFETs

Proposed (implemented with CMOS LP)

Proposed (implemented with GeNW RFET)

power, power-delay product (PDP), energy-delay product (EDP), and area. Delay represents the propagation delay of the critical path of the compressors, and power is the average dynamic power consumption during transitions. In addition, the PDP and EDP provide a more reasonable evaluation of the energy consumption of the circuits. As each three-gate RFET is approximately  $1.5 \times [28]$  larger than a single CMOS device due to its extra gate signals, we estimate the circuit area based on the unit size transistor (UST) model as proposed in [28].

The proposed RFET-based exact compressor consumes 72% less power than the CMOS counterpart with a negligible raise in delay. Hence PDP and EDP are improved in our design by 72% and 71%. Besides, the area is reduced by 25% thanks to the reconfigurability and multi-input support of RFET. The promising power consumption and area results of the proposed RFET-based compressor make it suitable for embedded applications.

To show the strength of our design, we also evaluated the naive RFET-based compressor. According to Table II, although in the naive implementation, the power consumption has decreased, the delay and area have increased significantly compared to the CMOS-based design. Hence, for achieving an efficient RFET-based circuit, 1-to-1 replacement of CMOS transistors with RFETs is not sufficient, and choosing a proper design that can exploit RFET features is crucial.

According to Table II, the proposed RFET-based approximate compressor improves the delay, power, PDP, EDP, and area by 38%, 8%, 42%, 64%, and 46% compared to the RFET-based exact compressor. Since our approximate compressor is designed based on the efficient RFET cells like the Min and MUX-INV, the area and energy efficiency of its RFET implementation is better than its CMOS implementation.

#### **B.** Multipliers

Approximate

1) HW Analysis: We simulated exact and approximate 8×8 Dadda multipliers using the exact and approximate compressors introduced in Section IV-A. All the compressors in the approximate multipliers are approximate. The simulation results of multipliers are given in Table III. Although the CMOS-based multiplier has 38% lower delay than the RFET-based multiplier, in our design, PDP and EDP have been reduced by 45% and 11%, respectively, due to a significant reduction in power consumption by 65%. As CMOS-based AND gates have higher performance than RFET-based AND gates, this increase in delay is partly due to the AND gates in the PPG phase. Comparing the results of our design with naive RFET-based design, we can see again that the 1-to-1 replacement of CMOS transistors with RFETs does not lead to an optimal design.

In comparison to the RFET-based exact multiplier, the RFET-based approximate multiplier reduces delay, power, PDP, EDP, and area by 48%, 2%, 49%, 74%, and 21%, respectively. Moreover, the RFETbased approximate multiplier has better area and energy efficiency than the CMOS-based approximate multiplier with the same structure.

TABLE IV The accuracy of the approximate  $8 \times 8$  multipliers

87.5

51.0

1450

1315.5

1.05

0.54

| Approximate multiplier     | MED    | NED    | ER(%) | Transistor<br>count* |
|----------------------------|--------|--------|-------|----------------------|
| Proposed                   | 1709.7 | 0.0263 | 96.7  | 15                   |
| Momeni1 [11]               | 3571.8 | 0.0549 | 99.8  | 28                   |
| Momeni2 [11]               | 3283.2 | 0.0505 | 99.4  | 26                   |
| Akbari1 $(DQ4: 2C_3)$ [12] | 2192.9 | 0.0337 | 92.6  | 20                   |
| Akbari2 $(DQ4: 2C_4)$ [12] | 1369.7 | 0.0211 | 77.4  | 30                   |
| Venkatachalam [13]         | 1282.9 | 0.0197 | 77.4  | 36                   |
| Moaiyeri [4]               | 2033.0 | 0.0313 | 96.9  | 24                   |
| Edavoor [1]                | 2489.4 | 0.0383 | 98.2  | 30                   |
| Kumar [18]                 | 495.07 | 0.0076 | 41.82 | 30                   |
| *                          | TTI I  |        | · · · | 1                    |

Required transistors for RFET-based implementation of each compressor

2) Accuracy Analysis of Approximate Multipliers: In this section, first, we introduce the accuracy metrics used to report the accuracy of approximate designs, then evaluate our approximate multiplier's accuracy. Error rate (ER), Mean Error Distance (MED), and Normalized Error Distance (NED) are well-known accuracy metrics [31]. ER is the probability of generating an erroneous output. The MED is defined as:

$$MED = \frac{1}{2^{2N}} \sum_{i=1}^{2^{2N}} |ED_i|$$
(8)

Where  $ED_i$  is the difference between exact and approximate output corresponding to  $i^{th}$  input and N is the bit length of the multiplier inputs. The NED is the mean error distance normalized by the maximum possible error and defined as:

$$NED = \frac{MED}{D} = \frac{1}{2^{2N}} \sum_{i=1}^{2^{2N}} \frac{|ED_i|}{(2^N - 1)^2}$$
(9)

Where D is the maximum possible error distance of an approximate circuit and here is equal to  $(2^N - 1)^2$ . To evaluate the accuracy of our approximate compressor in multiplication, we compared multipliers implemented using our compressor and other compressors in the literature. For a fair comparison, all the compressors in the multipliers are replaced with approximate compressors, and we do not consider any truncation. We applied all the possible input combinations (65536 inputs) to the approximate multipliers and calculated the accuracy metrics using MATLAB. The results of the accuracy analysis are given in Table IV. According to this table, the accuracy of the proposed multiplier is in the range of other approximate multipliers in the literature. However, if we implement all these designs using RFETs, our compressor has the lowest hardware complexity. It shows that RFET properties should be considered to design an efficient RFET-based approximate circuit. Note that the designs with better accuracy incur much more area overhead to the circuits.

 TABLE V

 THE PSNR AND SSIM OF THE APPROXIMATE MULTIPLIERS

| Approximate multiplier      | Avg. PSNR | Avg. SSIM |
|-----------------------------|-----------|-----------|
| Proposed                    | 31.39     | 0.87      |
| Momeni1 [11]                | 23.29     | 0.64      |
| Momeni2 [11]                | 23.80     | 0.68      |
| Akbari1 $(DQ4 : 2C_3)$ [12] | 29.04     | 0.81      |
| Akbari2 $(DQ4: 2C_4)$ [12]  | 33.45     | 0.93      |
| Venkatachalam [13]          | 33.50     | 0.93      |
| Moaiyeri [4]                | 26.32     | 0.76      |
| Edavoor [1]                 | 23.8      | 0.79      |
| Kumar [18]                  | 41.16     | 0.98      |



Fig. 7. Image multiplication sample a) Proposed b) Momeni1 [11] c) Momeni2 [11] d) Akbari1 [12] e) Akbari2 [12] f) Venkatachalam [13] g) Moaiyeri [4] h) Edavoor [1] i) Kumar [18]

3) Approximate Multiplier in the Image Multiplication: To evaluate the efficiency of the approximate multipliers in a widely used real application, we utilized them in the image multiplication. The standard images from [32] are selected as the test dataset, and the MATLAB environment is employed for pixel-to-pixel multiplication using the approximate multipliers. Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Metric (SSIM) are the primary metrics for assessing image quality. PSNR is based on the pixel-by-pixel comparison, whereas SSIM extracts the structural similarity of two images regarding the human visual system [33]. Table V shows the PSNR and SSIM values for the image multiplication samples. The PSNR and SSIM values of the proposed approximate multiplier are 31.39 and 0.87, respectively, which are acceptable compared to the other works in the literature. Besides, to visually show the effectiveness of the approximate multipliers, the result of an image multiplication sample is given in Fig. 7.

## V. CONCLUSION

In this paper, a compact and low-power exact 4:2 compressor is proposed exploiting device-level reconfigurability and multi-input support of RFET. We show that the 1-to-1 replacement of CMOS transistors with RFETs may not always lead to an optimal circuit, and choosing a proper design that can exploit RFET features is crucial. We also designed a Dadda multiplier employing the proposed compressor and RFET-based logic cells. We choose GeNW among other RFET models to implement our multiplier because it provides good performance along with low power consumption. Based on the circuit-level simulations, our multiplier achieves lower power, PDP and EDP by 65%, 45%, and 11%, respectively, compared to the CMOS-based design. Additionally, we proposed a novel approximate compressor based on efficient RFET logic cells to show the benefits of RFETs in designing approximate circuits for errorresilient applications. This approximate multiplier gains PSNR and SSIM of 31.39 and 0.87, respectively.

- P. J. Edavoor, S. Raveendran, and A. D. Rahulkar, "Approximate multiplier design using novel dual-stage 4: 2 compressors," *IEEE Access*, 2020.
- [2] A. Arasteh, M. H. Moaiyeri, M. Taheri, K. Navi, and N. Bagherzadeh, "An energy and area efficient 4: 2 compressor based on finfets," *Integration*, 2018.
- [3] M. D. Gavaber, M. Poorhosseini, and S. Pourmozafari, "Novel architecture for low-power cntfet-based compressors," JCSC, 2019.
- [4] M. H. Moaiyeri, F. Sabetzadeh, and S. Angizi, "An efficient majority-based compressor for approximate computing in the nano era," *Microsystem Technologies*, 2018.
- [5] M. Simon, A. Heinzig, J. Trommer, T. Baldauf, T. Mikolajick, and W. M. Weber, "Bringing reconfigurable nanowire fets to a logic circuits compatible process platform," in *IEEE NMDC*, 2016.
- [6] T. Mikolajick, A. Heinzig, J. Trommer, T. Baldauf, and W. Weber, "The rfet—a reconfigurable nanowire transistor and its application to novel electronic circuits and systems," *Semiconductor Science and Technology*, 2017.
- [7] M. Raitza, A. Kumar, M. Völp, D. Walter, J. Trommer, T. Mikolajick, and W. M. Weber, "Exploiting transistor-level reconfiguration to optimize combinational circuits," in *IEEE DATE*, 2017.
- [8] S. Rai, J. Trommer, M. Raitza, T. Mikolajick, W. M. Weber, and A. Kumar, "Designing efficient circuits based on runtime-reconfigurable field-effect transistors," *IEEE TVLSI*, 2018.
- [9] J. Zhang, X. Tang, P.-E. Gaillardon, and G. De Micheli, "Configurable circuits featuring dual-threshold-voltage design with three-independent-gate silicon nanowire fets," *IEEE TCAS-I*, 2014.
- [10] L. Amarú, P.-E. Gaillardon, J. Zhang, and G. De Micheli, "Power-gated differential logic style based on double-gate controllable-polarity transistors," *IEEE TCAS-II: Express Briefs*, 2013.
- [11] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, "Design and analysis of approximate compressors for multiplication," *IEEE TC*, 2015.
- [12] O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram, "Dual-quality 4: 2 compressors for utilizing in dynamic accuracy configurable multipliers," *IEEE TVLSI*, 2017.
- [13] S. Venkatachalam and S.-B. Ko, "Design of power and area efficient approximate multipliers," *IEEE TVLSI*, 2017.
- [14] F. Sabetzadeh, M. H. Moaiyeri, and M. Ahmadinejad, "An ultra-efficient approximate multiplier with error compensation for error-resilient applications," *IEEE TCAS-II*, 2022.
- [15] M. Ha and S. Lee, "Multipliers with approximate 4–2 compressors and error recovery modules," *IEEE ESL*, 2017.
- [16] A. G. Strollo, D. De Caro, E. Napoli, N. Petra, and G. Di Meo, "Low-power approximate multiplier with error recovery using a new approximate 4-2 compressor," in *IEEE ISCAS*, 2020.
- [17] H. Pei, X. Yi, H. Zhou, and Y. He, "Design of ultra-low power consumption approximate 4–2 compressors based on the compensation characteristic," *IEEE TCAS-II*, 2020.
- [18] U. A. Kumar, S. K. Chatterjee, and S. E. Ahmed, "Low-power compressor-based approximate multipliers with error correcting module," *IEEE ESL*, 2021.
- [19] J. Trommer, A. Heinzig, A. Heinrich, P. Jordan, M. Grube, S. Slesazeck, T. Mikolajick, and W. M. Weber, "Material prospects of reconfigurable transistor (rfets)–from silicon to germanium nanowires," *MRS OPL*, 2014.
- [20] J. Trommer, A. Heinzig, U. Muhle, M. Loffler, A. Winzer, P. M. Jordan, J. Beister, T. Baldauf, M. Geidel, B. Adolphi *et al.*, "Enabling energy efficiency and polarity control in germanium nanowire transistors by individually gated nanojunctions," *ACS nano*, 2017.
- [21] J. N. Quijada, T. Baldauf, S. Rai, A. Heinzig, A. Kumar, W. M. Weber, T. Mikolajick, and J. Trommer, "A germanium nanowire reconfigurable transistor model for predictive technology evaluation," *IEEE TNANO*, 2022.
- [22] J. Trommer, A. Heinzig, T. Baldauf, S. Slesazeck, T. Mikolajick, and W. M. Weber, "Functionality-enhanced logic gate design enabled by symmetrical reconfigurable silicon nanowire transistors," *IEEE TNANO*, 2015.
- [23] A. Heinzig, S. Slesazeck, F. Kreupl, T. Mikolajick, and W. M. Weber, "Reconfigurable silicon nanowire transistors," *Nano letters*, 2012.
- [24] M. De Marchi, D. Sacchetto, S. Frache, J. Zhang, P.-E. Gaillardon, Y. Leblebici, and G. De Micheli, "Polarity control in double-gate, gate-all-around vertically stacked silicon nanowire fets," in *IEEE IEDM*, 2012.
- [25] M. Simon, J. Trommer, B. Liang, D. Fischer, T. Baldauf, M. Khan, A. Heinzig, M. Knaut, Y. Georgiev, A. Erbe *et al.*, "A wired-and transistor: Polarity controllable fet with multiple inputs," in *IEEE DRC*, 2018.
- [26] M. Alioto and G. Palumbo, "Analysis and comparison on full adder block in submicron technology," *IEEE TVLSI*, 2002.
- [27] J. Romero-González and P.-E. Gaillardon, "An efficient adder architecture with three-independent-gate field-effect transistors," in *IEEE ICRC*, 2018.
- [28] M. M. Sharifi, R. Rajaei, P. Cadareanut, P.-E. Gaillardon, Y. Jin, M. Niemier, and X. S. Hu, "A novel tigfet-based dff design for improved resilience to power sidechannel attacks," in *IEEE DATE*, 2020.
- [29] ASU, "Ptm library," 2022. [Online]. Available: http://ptm.asu.edu/
- [30] S. Sinha, G. Yeric, V. Chandra, B. Cline, and Y. Cao, "Exploring sub-20nm finfet design with predictive technology models," in ACM/IEEE DAC, 2012.
- [31] J. Liang, J. Han, and F. Lombardi, "New metrics for the reliability of approximate and probabilistic adders," *IEEE TC*, 2013.
- [32] USC, "Sipi," 2022. [Online]. Available: https://sipi.usc.edu/database/
- [33] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," *IEEE TIP*, 2004.